knitr::opts_chunk$set(echo = TRUE)
# Load libraries for homework problems
library(tidyverse)
library(gt)
library(patchwork)
# Read in COVID-19 data
# R/make_data.R creates this file
cv19 <- read_csv('data/usa_covid19.csv')
The COVID-19 pandemic is an ongoing public health emergency in the United States (US) and worldwide. Since 2020-01-21, the New York times has monitored and shared COVID-19 data (see github repo here) from across the US at the state and county level.
I have modified the New York times data to include information about state’s population levels. The data are described below:
c("date" = "Date",
"state" = "State in the US",
"cases_total" = "Total number of cases as of date",
"deaths_total" = "Total number of deaths as of date",
"pop_2015" = "Estimated population as of 2015"
) %>%
enframe() %>%
gt(rowname_col = "name") %>%
tab_stubhead(label = 'Variable name') %>%
cols_label(value = 'Variable description') %>%
cols_align('right') %>%
tab_footnote(locations = cells_body(rows = 5, columns = 2),
footnote = "Source: usmap::countypop") %>%
tab_footnote(locations = cells_body(columns = 2, rows = 2),
footnote = 'US = United States') %>%
tab_header(title = 'Dictionary for New York Times COVID-19 data',
subtitle = paste("Last updated:", max(cv19$date)))
| Dictionary for New York Times COVID-19 data | |
|---|---|
| Last updated: 2020-06-03 | |
| Variable name | Variable description |
| date | Date |
| state | State in the US1 |
| cases_total | Total number of cases as of date |
| deaths_total | Total number of deaths as of date |
| pop_2015 | Estimated population as of 20152 |
|
1
US = United States
2
Source: usmap::countypop
|
|
The data (cv19) are printed below:
cv19
Create two new columns in cv19:
cases_new the number of new cases identified on a given day for a given state.
deaths_new the number of new deaths confirmed on a given day for a given state.
Notes:
the lag() function is helpful for this.
Your solution should look like this
read_rds('solutions/01_solution.rds')
Compute the total number of new cases identified and deaths confirmed each day in the USA on or after March 1st, 2020. Your summarized data should look like this:
read_rds('solutions/02_solution.rds')
Using the data created in problem 2, create two bar plots showing the number of new cases identified and deaths confirmed in the USA after March 1st, 2020.
Notes This is a great chance to learn about the patchwork R package.
Your solution should look like this
read_rds('solutions/03_solution.rds')
Using the data from problem 1, create the table below:
read_rds('solutions/04_solution.rds')
| Ten states in the US with highest death rates due to COVID-19 | ||||||
|---|---|---|---|---|---|---|
| Data presented for: 2020-06-03 | ||||||
| Cases | Deaths | |||||
| Total count | Rate per 100k | Moving average ratio | Total count | Rate per 100k | Moving average ratio | |
| New York | 378,924 | 1,914.2 | 0.9 | 29,918 | 151.1 | 0.7 |
| New Jersey | 162,068 | 1,809.2 | 0.9 | 11,880 | 132.6 | 0.9 |
| Connecticut | 43,091 | 1,200.0 | 0.8 | 3,989 | 111.1 | 0.7 |
| Massachusetts | 101,592 | 1,495.2 | 1.4 | 7,152 | 105.3 | 1.3 |
| District of Columbia | 9,016 | 1,341.2 | 0.7 | 473 | 70.4 | 0.7 |
| Rhode Island | 15,219 | 1,440.8 | 0.9 | 742 | 70.2 | 0.7 |
| Louisiana | 41,244 | 883.0 | 0.8 | 2,870 | 61.4 | 1.3 |
| Michigan | 58,990 | 594.5 | 1.3 | 5,579 | 56.2 | 0.9 |
| Pennsylvania | 77,871 | 608.2 | 0.8 | 5,742 | 44.9 | 0.9 |
| Illinois | 124,279 | 966.4 | 0.7 | 5,665 | 44.1 | 1.0 |
Hints:
To create the dataset that will generate this table, you will likely need to - filter the data from problem 1 to contain the 2 most recent weeks.
derive the following columns:
cases_per100k: Number of cases per 100,000 citizens
deaths_per100k: Number of deaths per 100,000 citizens
cases_mar: Number of days until case count doubles, based on current day’s case count
deaths_mar: Number of days until death count doubles, based on current day’s death count.
filter your data down to the 10 states that have the highest death rate per 100,000 citizens.
Learn something new: take a look at a famous flipbook created by Gina Reynolds. The cv19 data have a very similar structure to that of the flipbook in Gina’s talk. Learn about the ggplot2 tools that are used in the flipbook and try to adapt them to create the ‘racing bar chart’ below.